WordCloud of sleep text scores¶

In [1]:
%%bash
sudo apt update
sudo apt install fonts-ipaexfont  # for Japanese in wordcloud
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:6 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [61.7 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,199 kB]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,631 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:14 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3,513 kB]
Get:15 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,554 kB]
Get:16 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1,226 kB]
Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,517 kB]
Get:18 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2,854 kB]
Get:19 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [3,652 kB]
Fetched 19.6 MB in 5s (4,022 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
52 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  fonts-ipaexfont-gothic fonts-ipaexfont-mincho
The following NEW packages will be installed:
  fonts-ipaexfont fonts-ipaexfont-gothic fonts-ipaexfont-mincho
0 upgraded, 3 newly installed, 0 to remove and 52 not upgraded.
Need to get 7,954 kB of archives.
After this operation, 14.1 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont-gothic all 00401-3ubuntu1 [3,341 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont-mincho all 00401-3ubuntu1 [4,604 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 fonts-ipaexfont all 00401-3ubuntu1 [8,428 B]
Fetched 7,954 kB in 2s (4,636 kB/s)
Selecting previously unselected package fonts-ipaexfont-gothic.
(Reading database ... 
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 123632 files and directories currently installed.)
Preparing to unpack .../fonts-ipaexfont-gothic_00401-3ubuntu1_all.deb ...
Unpacking fonts-ipaexfont-gothic (00401-3ubuntu1) ...
Selecting previously unselected package fonts-ipaexfont-mincho.
Preparing to unpack .../fonts-ipaexfont-mincho_00401-3ubuntu1_all.deb ...
Unpacking fonts-ipaexfont-mincho (00401-3ubuntu1) ...
Selecting previously unselected package fonts-ipaexfont.
Preparing to unpack .../fonts-ipaexfont_00401-3ubuntu1_all.deb ...
Unpacking fonts-ipaexfont (00401-3ubuntu1) ...
Setting up fonts-ipaexfont-mincho (00401-3ubuntu1) ...
update-alternatives: using /usr/share/fonts/opentype/ipaexfont-mincho/ipaexm.ttf to provide /usr/share/fonts/truetype/fonts-japanese-mincho.ttf (fonts-japanese-mincho.ttf) in auto mode
Setting up fonts-ipaexfont-gothic (00401-3ubuntu1) ...
update-alternatives: using /usr/share/fonts/opentype/ipaexfont-gothic/ipaexg.ttf to provide /usr/share/fonts/truetype/fonts-japanese-gothic.ttf (fonts-japanese-gothic.ttf) in auto mode
Setting up fonts-ipaexfont (00401-3ubuntu1) ...
Processing triggers for fontconfig (2.13.1-4.2ubuntu5) ...
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 3.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
In [2]:
!pip install wordcloud
!pip install japanize-matplotlib  # for Japanese in matplotlib graph
Requirement already satisfied: wordcloud in /usr/local/lib/python3.10/dist-packages (1.9.4)
Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.10/dist-packages (from wordcloud) (1.26.4)
Requirement already satisfied: pillow in /usr/local/lib/python3.10/dist-packages (from wordcloud) (11.0.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from wordcloud) (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (4.55.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (24.2)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->wordcloud) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->wordcloud) (1.17.0)
Collecting japanize-matplotlib
  Downloading japanize-matplotlib-1.1.3.tar.gz (4.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 47.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from japanize-matplotlib) (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (4.55.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.4.7)
Requirement already satisfied: numpy<2,>=1.21 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (24.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->japanize-matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->japanize-matplotlib) (1.17.0)
Building wheels for collected packages: japanize-matplotlib
  Building wheel for japanize-matplotlib (setup.py) ... done
  Created wheel for japanize-matplotlib: filename=japanize_matplotlib-1.1.3-py3-none-any.whl size=4120257 sha256=bb9a98e59f51b2826d6709f8f6d1b1b9fd0772e6476dbd92570e78346c268d8c
  Stored in directory: /root/.cache/pip/wheels/61/7a/6b/df1f79be9c59862525070e157e62b08eab8ece27c1b68fbb94
Successfully built japanize-matplotlib
Installing collected packages: japanize-matplotlib
Successfully installed japanize-matplotlib-1.1.3

Import libraries¶

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import japanize_matplotlib  # for Japanese in matplotlib graph
from wordcloud import WordCloud, STOPWORDS

Setup working directory¶

In [4]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/Documents/ds2024/dsF1/
Mounted at /content/drive
/content/drive/MyDrive/Documents/ds2024/dsF1

Parameters¶

In [5]:
csv_in = 'sleep-text-score-wakati.csv'

Read CSV file¶

In [6]:
df = pd.read_csv(csv_in, sep=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())
(426, 4)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 426 entries, 0 to 425
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   text               426 non-null    object
 1   GPT-4o             426 non-null    int64 
 2   Gemini-1.5-Pro     426 non-null    int64 
 3   Claude-3.5-Sonnet  426 non-null    int64 
dtypes: int64(3), object(1)
memory usage: 13.4+ KB
None
text GPT-4o Gemini-1.5-Pro Claude-3.5-Sonnet
0 就寝 時間 毎日 一定 する 2 2 2
1 朝日 積極的 浴びる 2 2 2
2 寝室 温度 18 -22度 保つ 2 2 2
3 就寝 前 ストレッチ 体 リラックス さ せる 2 2 2
4 寝具 定期的 清潔 保つ 2 2 2

Check the number of documents in each category¶

In [7]:
print(df['GPT-4o'].value_counts().sort_index(ascending=True))
GPT-4o
0    164
1     61
2    201
Name: count, dtype: int64

Generating WordCloud¶

In [8]:
fpath = "/usr/share/fonts/opentype/ipaexfont-gothic/ipaexg.ttf"
In [11]:
sorted_labels = sorted(df['GPT-4o'].unique())

for label in sorted_labels:
    text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')

    wc = WordCloud(width=800, height=400, background_color='white',
                   font_path=fpath).generate(text_data)

    plt.figure(figsize=(10, 5))
    plt.imshow(wc)
    plt.axis('off')
    plt.title(f'Word Cloud for Label: {label}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [13]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室'])

sorted_labels = sorted(df['GPT-4o'].unique())

for label in sorted_labels:
    text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')

    wc = WordCloud(width=800, height=400, background_color='white',
                   font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)

    plt.figure(figsize=(10, 5))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.title(f'Word Cloud for Label: {label}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [15]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室', '夜', '見る'])

sorted_labels = sorted(df['GPT-4o'].unique())

for label in sorted_labels:
    text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')

    wc = WordCloud(width=800, height=400, background_color='white',
                   font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)

    plt.figure(figsize=(10, 5))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.title(f'Word Cloud for Label: {label}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [16]:
excluded_words = set(['寝る', '前', '直前', 'する', '就寝', '寝室', '夜', '見る', '軽い', '楽しむ'])

sorted_labels = sorted(df['GPT-4o'].unique())

for label in sorted_labels:
    text_data = df[df['GPT-4o'] == label]['text'].str.cat(sep=' ')

    wc = WordCloud(width=800, height=400, background_color='white',
                   font_path=fpath, stopwords=STOPWORDS.union(excluded_words)).generate(text_data)

    plt.figure(figsize=(10, 5))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.title(f'Word Cloud for Label: {label}')
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [ ]: